1,295 research outputs found
Indirect Match Highlights Detection with Deep Convolutional Neural Networks
Highlights in a sport video are usually referred as actions that stimulate
excitement or attract attention of the audience. A big effort is spent in
designing techniques which find automatically highlights, in order to
automatize the otherwise manual editing process. Most of the state-of-the-art
approaches try to solve the problem by training a classifier using the
information extracted on the tv-like framing of players playing on the game
pitch, learning to detect game actions which are labeled by human observers
according to their perception of highlight. Obviously, this is a long and
expensive work. In this paper, we reverse the paradigm: instead of looking at
the gameplay, inferring what could be exciting for the audience, we directly
analyze the audience behavior, which we assume is triggered by events happening
during the game. We apply deep 3D Convolutional Neural Network (3D-CNN) to
extract visual features from cropped video recordings of the supporters that
are attending the event. Outputs of the crops belonging to the same frame are
then accumulated to produce a value indicating the Highlight Likelihood (HL)
which is then used to discriminate between positive (i.e. when a highlight
occurs) and negative samples (i.e. standard play or time-outs). Experimental
results on a public dataset of ice-hockey matches demonstrate the effectiveness
of our method and promote further research in this new exciting direction.Comment: "Social Signal Processing and Beyond" workshop, in conjunction with
ICIAP 201
Keller--Osserman conditions for diffusion-type operators on Riemannian Manifolds
In this paper we obtain generalized Keller-Osserman conditions for wide
classes of differential inequalities on weighted Riemannian manifolds of the
form and , where is a non-linear diffusion-type operator.
Prototypical examples of these operators are the -Laplacian and the mean
curvature operator. While we concentrate on non-existence results, in many
instances the conditions we describe are in fact necessary for non-existence.
The geometry of the underlying manifold does not affect the form of the
Keller-Osserman conditions, but is reflected, via bounds for the modified
Bakry-Emery Ricci curvature, by growth conditions for the functions and
. We also describe a weak maximum principle related to inequalities of
the above form which extends and improves previous results valid for the
\vp-Laplacian
On the -flow by -Laplace approximation: new estimates via fake distances under Ricci lower bounds
In this paper we show the existence of weak solutions of the inverse mean curvature flow starting from a relatively
compact set (possibly, a point) on a large class of manifolds satisfying Ricci
lower bounds. Under natural assumptions, we obtain sharp estimates for the
growth of and for the mean curvature of its level sets, that are well
behaved with respect to Gromov-Hausdorff convergence. The construction follows
R. Moser's approximation procedure via the -Laplace equation, and relies on
new gradient and decay estimates for -harmonic capacity potentials, notably
for the kernel of . These bounds, stable as , are achieved by studying fake distances associated to capacity
potentials and Green kernels. We conclude by investigating some basic
isoperimetric properties of the level sets of .Comment: 61 pages. Revised version. Section 3.2 (properness under volume
doubling and weak Poincar\'e inequalities, p.41-45) was rewritten, and the
main Theorems 1.4 and 4.6 changed accordingl
Ricci almost solitons
We introduce a natural extension of the concept of gradient Ricci soliton:
the Ricci almost soliton. We provide existence and rigidity results, we deduce
a-priori curvature estimates and isolation phenomena, and we investigate some
topological properties. A number of differential identities involving the
relevant geometric quantities are derived. Some basic tools from the weighted
manifold theory such as general weighted volume comparisons and maximum
principles at infinity for diffusion operators are discussed
F-formation Detection: Individuating Free-standing Conversational Groups in Images
Detection of groups of interacting people is a very interesting and useful
task in many modern technologies, with application fields spanning from
video-surveillance to social robotics. In this paper we first furnish a
rigorous definition of group considering the background of the social sciences:
this allows us to specify many kinds of group, so far neglected in the Computer
Vision literature. On top of this taxonomy, we present a detailed state of the
art on the group detection algorithms. Then, as a main contribution, we present
a brand new method for the automatic detection of groups in still images, which
is based on a graph-cuts framework for clustering individuals; in particular we
are able to codify in a computational sense the sociological definition of
F-formation, that is very useful to encode a group having only proxemic
information: position and orientation of people. We call the proposed method
Graph-Cuts for F-formation (GCFF). We show how GCFF definitely outperforms all
the state of the art methods in terms of different accuracy measures (some of
them are brand new), demonstrating also a strong robustness to noise and
versatility in recognizing groups of various cardinality.Comment: 32 pages, submitted to PLOS On
The Visual Social Distancing Problem
One of the main and most effective measures to contain the recent viral
outbreak is the maintenance of the so-called Social Distancing (SD). To comply
with this constraint, workplaces, public institutions, transports and schools
will likely adopt restrictions over the minimum inter-personal distance between
people. Given this actual scenario, it is crucial to massively measure the
compliance to such physical constraint in our life, in order to figure out the
reasons of the possible breaks of such distance limitations, and understand if
this implies a possible threat given the scene context. All of this, complying
with privacy policies and making the measurement acceptable. To this end, we
introduce the Visual Social Distancing (VSD) problem, defined as the automatic
estimation of the inter-personal distance from an image, and the
characterization of the related people aggregations. VSD is pivotal for a
non-invasive analysis to whether people comply with the SD restriction, and to
provide statistics about the level of safety of specific areas whenever this
constraint is violated. We then discuss how VSD relates with previous
literature in Social Signal Processing and indicate which existing Computer
Vision methods can be used to manage such problem. We conclude with future
challenges related to the effectiveness of VSD systems, ethical implications
and future application scenarios.Comment: 9 pages, 5 figures. All the authors equally contributed to this
manuscript and they are listed by alphabetical order. Under submissio
MX-LSTM: mixing tracklets and vislets to jointly forecast trajectories and head poses
Recent approaches on trajectory forecasting use tracklets to predict the
future positions of pedestrians exploiting Long Short Term Memory (LSTM)
architectures. This paper shows that adding vislets, that is, short sequences
of head pose estimations, allows to increase significantly the trajectory
forecasting performance. We then propose to use vislets in a novel framework
called MX-LSTM, capturing the interplay between tracklets and vislets thanks to
a joint unconstrained optimization of full covariance matrices during the LSTM
backpropagation. At the same time, MX-LSTM predicts the future head poses,
increasing the standard capabilities of the long-term trajectory forecasting
approaches. With standard head pose estimators and an attentional-based social
pooling, MX-LSTM scores the new trajectory forecasting state-of-the-art in all
the considered datasets (Zara01, Zara02, UCY, and TownCentre) with a dramatic
margin when the pedestrians slow down, a case where most of the forecasting
approaches struggle to provide an accurate solution.Comment: 10 pages, 3 figures to appear in CVPR 201
Forecasting People Trajectories and Head Poses by Jointly Reasoning on Tracklets and Vislets
In this work, we explore the correlation between people trajectories and
their head orientations. We argue that people trajectory and head pose
forecasting can be modelled as a joint problem. Recent approaches on trajectory
forecasting leverage short-term trajectories (aka tracklets) of pedestrians to
predict their future paths. In addition, sociological cues, such as expected
destination or pedestrian interaction, are often combined with tracklets. In
this paper, we propose MiXing-LSTM (MX-LSTM) to capture the interplay between
positions and head orientations (vislets) thanks to a joint unconstrained
optimization of full covariance matrices during the LSTM backpropagation. We
additionally exploit the head orientations as a proxy for the visual attention,
when modeling social interactions. MX-LSTM predicts future pedestrians location
and head pose, increasing the standard capabilities of the current approaches
on long-term trajectory forecasting. Compared to the state-of-the-art, our
approach shows better performances on an extensive set of public benchmarks.
MX-LSTM is particularly effective when people move slowly, i.e. the most
challenging scenario for all other models. The proposed approach also allows
for accurate predictions on a longer time horizon.Comment: Accepted at IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE
INTELLIGENCE 2019. arXiv admin note: text overlap with arXiv:1805.0065
Recognition self-awareness for active object recognition on depth images
We propose an active object recognition framework that introduces the recognition self-awareness, which is an intermediate level of reasoning to decide which views to cover during the object exploration. This is built first by learning a multi-view deep 3D object classifier; subsequently, a 3D dense saliency volume is generated by fusing together single-view visualization maps, these latter obtained by computing the gradient map of the class label on different image planes. The saliency volume indicates which object parts the classifier considers more important for deciding a class. Finally, the volume is injected in the observation model of a Partially Observable Markov Decision Process (POMDP). In practice, the robot decides which views to cover, depending on the expected ability of the classifier to discriminate an object class by observing a specific part. For example, the robot will look for the engine to discriminate between a bicycle and a motorbike, since the classifier has found that part as highly discriminative. Experiments are carried out on depth images with both simulated and real data, showing that our framework predicts the object class with higher accuracy and lower energy consumption than a set of alternatives
- …